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Eligiendo el mejor estimador no paramétrico para calcular riqueza 
en bases de datos de macroinvertebrados bentónicos 


a RESUMEN. Los estimadores no paramétricos permiten comparar la 
riqueza estimada de conjuntos de datos de origen diverso. Empero, como su 
comportamiento depende de la distribución de abundancia del conjunto de 
datos, la preferencia por alguno representa una decisión difícil. Este trabajo 
rescata algunos criterios presentes en la literatura para elegir el estimador 
más adecuado para macroinvertebrados bentónicos de ríos y ofrece algunas 
herramientas para su aplicación. Cuatro estimadores de incidencia y dos de 
abundancia se aplicaron a un inventario regional a nivel de familia y género. 
Para su evaluación se consideró: el tamaño de submuestra para estimar la 
riqueza observada, la constancia de ese tamaño de submuestra, la ausencia 
de comportamiento errático y la similitud en la forma de la curva entre los 
distintos conjuntos de datos. Entre los estimadores de incidencia, el mejor 
fue Jack1; entre los de abundancia, ACE para muestras de baja riqueza y 
Chao1, para las de alta riqueza. La forma uniforme de las curvas permitió 
describir secuencias generales de comportamiento, que pueden utilizarse 
como referencia para comparar curvas de pequeñas muestras e inferir su 
comportamiento —y riqueza— probable, si la muestra fuera mayor. Estos 
resultados pueden ser muy útiles para la gestión ambiental y actualizan el 
estado del conocimiento regional de macroinvertebrados. 


PALABRAS CLAVE. Riqueza estimada. Ríos neotropicales. Gestión 
ambiental. 


O ABSTRACT. Non-parametric estimators allow to compare the 
estimates of richness among data sets from heterogeneous sources. However, 
sincethe estimator performance depends on the species-abundance distribution 
of the sample, preference for one or another is a difficult issue. The present 
study recovers and revalues some criteria already present in the literature in 
order to choose the most suitable estimator for streams macroinvertebrates, 
and provides some tools to apply them. Two abundance and four incidence 
estimators were applied to a regional database at family and genus level. They 
were evaluated under four criteria: sub-sample size required to estimate the 
observed richness; constancy of the sub-sample size; lack of erratic behavior 
and similarity in curve shape through different data sets. Among incidence 
estimators, Jack1 had the best performance. Between abundance estimators, 
ACE was the best when the observed richness was small and Chao1 when 
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the observed richness was high. The uniformity of curves shapes allowed to 
describe the general sequences of curves behavior that could act as references 
to compare estimations of small databases and to infer the possible behavior 
of the curve (í.ethe expected richness) if the sample were larger. These results 
can be very useful for environmental management, and update the state of 
knowledge of regional macroinvertebrates. 


KEY WORDS. Richness estimation. Neotropical streams. Environmental 


management. 


INTRODUCTION 


Species richness is a widely used measure 
of biodiversity because it is relatively 
easy to measure and is well understood 
by researchers, managers and the society 
(Hellmann & Fowler, 1999). Unfortunately, 
it depends heavily on sample size, then if we 
make a simple count of the species present 
in a sample, surely we will fall into an 
underestimation, unless we make a census 
of the community of interest (Hellmann & 
Fowler, 1999). Because exhaustive sampling 
is impractical or rarely supported (especially 
in tropical invertebrate, microbial or plant 
communities) (Hellmann & Fowler, 1999; 
Gotelli & Colwell, 2001), the estimation 
of richness based on available biological 
inventories has received in the last decade 
progressively more attention in both 
theoretical and empirical aspects (Walther & 
Morand, 1998; Heyer et al., 1999; Petersen 
et al., 2003; Chao et al., 2005). Much of the 
importance of these methods lies in that they 
make comparable the estimates of richness 
among data sets from different regions, 
seasons, results of different methodologies or 
sampling effort (Jiménez-Valverde & Hortal, 
2003). 

Among the methodologies currently 
used for this task, four groups can be 
distinguished: non-parametric estimators, 
fitting  species-abundance distributions, 
species accumulation curves and species- 
area curves (Hortal et a/., 2006). 

Here, | will focus only in the non- 
parametric estimators. These indices use 
the abundance or incidence of rare species 
in the samples to estimate total number of 


species, using a previously formulated non- 
parametric model (e.g. Chao & Bunge, 2002; 
Sørensen et al., 2002; Chiarucci et al., 2003; 
Rosenzweig et a/. 2003). There are two kinds 
of non-parametric estimators: abundance 
and incidence models. 

The abundance model considers the 
number of individuals that represents each 
species in the sample, and the incidence 
model considers the number of samples 
in which each species is present (ie. 
presence-absence data). These methods use 
the number of rare species to estimate the 
possible number of undiscovered species. 

Since the abundance and incidence of 
rare species grow with the increasing of 
sampling effort, these methods are expected 
to estimate different species richness as 
sampling effort increases. Moreover, as 
different populations have different species- 
abundance distributions, the estimator 
performance should depend also on the 
species-abundance distribution of the data 
set (Bunge & Fitzpatrick, 1993; Soberón 
& Llorente, 1993; Colwell & Coddington, 
1994; Walter & Morand, 1998). Thus, given 
a particular data set, the preference for one 
or another method should be based on the 
extent that they meet the characteristics of an 
ideal estimator. 

Many authors (e.g. Palmer, 1990; 
Hellmann & Flower, 1999; Walter & Morand, 
1998; Hortal et al., 2006) agree that some 
of these characteristics are: independence 
of sample size (amount of sampling effort 
carried out); insensitivity to unevenness in 
species distributions; insensitivity to sample 
order (Chazdon et a/., 1998) and insensitivity 
to heterogeneity in the sample units used 
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among studies to compare richness values 
obtained from different survey strategies, 
which is often the case for macroecological 
studies (Hortal et al., 2006). 

Unfortunately, to reach these conditions 
is a very difficult challenge for an estimator, 
and also to assess them, especially if quick 
answers about species richness estimation 
are needed. However, some simple and 
practical criteria can offer an overview of the 
performance of estimators for taxa for which 
exhaustive statistics assessments have not yet 
been implemented. 

This study seek to recover and revalue 
some criteria already present in the literature 
and to provide some tools to apply them in 
order to assist researchers, managers, and 
policy makers to choose the most suitable 
non-parametric estimator of richness of 
streams macroinvertebrates communities. 

Furthermore, such criteria are applied 
to a large and heterogeneous regional 
database of benthic macroinvertebrates 
from a subtropical Andean area. Results are 
exemplified at two hierarchical levels and 
several levels of sampling efforts, making 
such tools applicable to a wide range of 
situations. Additionally, this work provides 
valuable information regarding the regional 
state of knowledge of the taxa analyzed. 


MATERIAL AND METHODS 


Data set 
A large database of benthic 
macroinvertebrates from the Yungas 


ecoregion (NW of Argentina, Neotropical 
region) was used to calculate the richness 
of Ephemeroptera, Trichoptera, Coleoptera, 
Diptera and Prostigmata (Hydrachnidia) 
at family and genera level. This database 
belongs to the Instituto de Biodiversidad 
Neotropical (IBN), and includes data of all the 
specimens collected during a decade by its 
members, who made significant contributions 
to the knowledge of macroinvertebrates of 
northwestern Argentina in both systematic 
and ecological aspects (Domínguez & 
Fernández, 2009). Since the specimens were 
collected with diverse purposes, records 


come from a variety of sampling methods. 
Other orders of macroinvertebrates present 
in this database were discarded due to low 
representativeness. 

All collection sites available for each 
order at each taxonomic level were used 
as units of sampling effort, regardless of the 
number of visits, season or sampling method 
used (mist nets, light traps, hand picking, D 
frame net, Surber net, etc). 

For the purposes of this work a “site” 
was defined as a stretch of a river or stream 
(rhithron) of approximately 200 m long, with 
data of geographic coordinates, elevation, 
ecoregion, and date of collection. This unit 
of sampling effort was chosen to account 
for natural levels of sample heterogeneity 
(patchiness) in the data (Gotelli & Colwell, 
2001). However, inorderto make comparable 
estimations among different groups, graphics 
results were rescaled to the percentage of 
sites (to family level) or individuals (to genera 
level) as a measure of sampling effort. 


Richness estimators 

Six non-parametric estimators were 
calculated: fourincidence estimators (Jacknife 
1, Jacknife 2, Chao2 and Incidence-based 
Coverage Estimator) and two abundance 
estimators (Abundance-based Coverage 
Estimator and Chao1). 

Incidenceor presence-absence estimators, 
use counts of “uniques” and “duplicates”, i.e. 
species that are present only in one or two 
samples respectively (first-order Jackknife or 
Jack 1 and Jackknife of second order or Jack 
2; Burnham & Overton 1978, 1979). The 
Chao2 estimator (Chao, 1987; Colwell 1997), 
takes into account rare species and the total 
number of species observed in the sample to 
calculate its richness. The Incidence-based 
Coverage Estimator (ICE) (Lee & Chao, 1994; 
Colwell, 1997) calculates both rare and 
common species, considering as “common” 
those that occur in more than 10 samples (by 
default) or a value chosen by the user. 

Abundance estimators base their 
calculations firstly in “singletons” and 
“doubletons” species, i.e. the number of 
species represented for just one or two 
individuals (Chao1) (Chao, 1984; Colwell, 
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1997). Besides singletons and doubletons, the 
Abundance-based Coverage Estimator (ACE) 
(Chao et al. 1993; Colwell, 1997) considers 
abundant species, i.e. those represented by 
more than 10 individuals by default. 

To proceed, the program requires the 
upload of the data in the form of an abundance 
spreadsheet (to calculate abundance and / 
or incidence estimators) or an incidence 
spreadsheet (only for incidence estimators). 
This spreadsheet must specify the number 
of samples and taxa and the abundance or 
presence of each taxon in each sample. The 
program admits six models of spreadsheet 
that meet these conditions, which is a very 
convenient option for the user. Data do not 
require previous statistical transformations. 

All estimators were calculated with the 
program Estimates 7.0.1 (Colwell, 1997) and 
their mathematical expressions are given in 
the Appendix. 


Evaluation of estimators 

Bias and accuracy of the estimated 
richness are the most popular measures to 
evaluate estimation methods. To use these 
measures it is necessary to choose an “a priori” 
sub-sample size, which is a difficult choice 
in rich assemblages like macroinvertebrates 
communities were the estimated richness 
is strongly dependent on sample size. For 
this reason, in a study of evaluation of non- 
parametric estimators applied to benthic 
macroinvertebrates, Melo & Froehlich (2001) 
opted for not using such bias and accuracy 
statistics and instead they used four criteria 
they argued are more practical and realistic. 
These criteria were: 

1) Sub-sample size required to estimate 
the observed richness in the total sample. 
The smaller the sub-sample size, the 
better performance of the estimator. This 
characteristic can be interpreted as an 
indication of the capability of the estimator 
to reach a reliable value of richness with a 
small sampling effort. 

2) Constancy of the sub-sample size 
needed to estimate the total observed 
richness, measured as 1 standard deviation 
(SD) of the previous criterion (i.e. SD of the 
estimation that equals observed richness in 


the minimum percentage of sampling effort). 

3) Lack of erratic behavior in curve shape 
(specifically large variations of estimates for 
closely similar sub-sample sizes) is considered 
a greater stability and therefore greater 
reliability of the estimate. Melo & Froehlich 
(2001) made a qualitative categorization of 
this criterion (good stability/bad stability) 
and here | propose a very simple quantitative 
tool to measure this characteristic. This tool 
consists in the sum ofthe absolute value ofthe 
differences between Sobs (i.e. the “observed 
richness”) and the three previous estimates 
and the three posterior estimations of Sobs. 
In this way a dimension of the stability of the 
curve was obtained in a segment of seven 
items, whose median position is Sobs. 

Hereafter | will refer to these first three 
criteria as “quantitative criteria“. 

4) Similarity in curve shape through 
different data sets. This is a very appreciated 
characteristic since it makes the behavior of 
the estimator predictable and easy to interpret 
when it is applied to new data sets. Although 
it is mentioned by Melo & Froehlich (2001), 
some additional details presented here 
would be useful. In this study, uniformity 
or similarity of the shape of the curves 
was evaluated qualitatively, considering: 
tendency of the slope at different levels of 
sampling effort, presence of peaks of over 
or under estimation and position of the 
maximum or minimum estimations respect 
to level of sampling effort. 

The six non parametric-estimators were 
applied to the dataset and then they were 
evaluated under the four criteria mentioned 
in order to provide further evidence to 
decide on the convenience of applying them 
to datasets of benthic macroinvertebrates in 
subtropical areas. 

Performance of estimators was assessed 
according to the following categories: 
incidence estimators at family level; 
incidence estimators at genera level; 
abundance estimators at family level and 
abundance estimators at genera level. 

For each of the quantitative criteria (í.e. 
percentage of samples to achieve Sobs, 
standard deviation, stability) the incidence 
estimators were ranked from 1 to 4 according 
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to the results, being the score 1 assigned 
to the one with the best performance (i.e. 
estimator that obtained the lowest value in 
this criterion) and 4 to the one with the worst 
performance (i.e. estimator that obtained the 
highest value in this criterion). The final scores 
for each estimator were obtained adding 
those ranking positions. The same procedure 
was applied to abundance estimators except 
that only 1 and 2 scores were possible. 


RESULTS 


Regarding the three quantitative criteria, 
performance of estimators varied depending 
on the taxa considered. In Tables, numbers 
between parentheses represent the score 
assigned to each estimator according to 
the value obtained in each criterion. For 
example, for incidence estimators at family 
level under “SD” criterion for the order 
Coleoptera (Table 1), Jacki obtained the 
lowest value (1,68) and Chao2 the highest 
value (6,44), then their scores were (1) and 
(4) respectively. “Ranking position” is the 
result of the sum of all scores obtained for 
each estimator, e.g. for ICE in Coleoptera this 
would be: (3)+(1)+(2)=6, as six is the lowest 
value, ICE results the estimator with better 
performance for that order. Estimators having 
the best ranking positions in most orders are 
considered the estimators of best overall 
performance. For each category, these were: 

- Incidence estimators at family level (see 
Table I): Jack1, is the only one that always 
took the first or second ranking position. 

- Incidence estimators at genera level (see 
Table I): Jack1 took the first position in four 
of the five cases, followed by ICE, with very 
close scores, 

- Abundance estimators at family level 
(see Table II): ACE o Chao1 with very close 
scores, 

- Abundance estimators at genera level 
(see Table II): Chao1 took the first position 
in all cases. 

Most of the estimators had a different 
performance depending on the sample under 
study, except Chao1 -which was always the 
most stable- and Jack2 which reached the 


final value of estimated richness with a lower 
sampling effort in almost all cases. 


Respect to similarity in curves shape, the 
behavior of estimators shows the following 
general sequences: 


ICE (see Fig. la): General behavior: 
This estimator presented its maximum peak 
followed by a minimum peak in the first 10- 
20% of samples and then showed a gradual 
growth up to the final estimate. General 
sequence variations: In Coleoptera, after an 
overestimation peak there was a progressive 
increase in the slope of the curve until it 
achieved the maximum estimated value. 


Chao2 (see Fig. 1b): General behavior: 
At family level, the three less sampled orders 
(Diptera, Prostigmata and Coleoptera) curves 
showed a very similar pattern, although out 
of phase in sampling effort. The overall 
behavior sequence would be: pronounced 
growth (in the first 10- 20% of the samples), 
overestimation plateau, and decrease to final 
value (see Fig. 1). At genus level (see Fig. 1b) 
-except for Trichoptera- the curves described 
a similar pattern to that seen in families but 
much more blurred by the large number of 
small peaks in the range and because the 
sequence described occurs at very different 
levels of sampling for each order. General 
sequence variations: In the better sampled 
orders (Trichoptera and Ephemeroptera), 
at family level the curves had a tendency 
to a continuous growth (í.e. without peaks) 
from the 20% of sampling effort. Trichoptera 
genera curve, on the other hand, did not 
show resemblance respect to those described 
above, as it showed cubic growth from the 
66% of samples toward the end of the axis 
x (R?=0.99). 


Jack1 (see Fig. 1c): General behavior: 
At genus level, there was no clear distinction 
between well or sub sampled orders, all 
showing a logarithmic growth. At family 
level this growth was very pronounced in 
the three less sampled orders where the 
curves fit very well to a logarithmic curve 
(R?=0.88 Diptera and R*= 0.99 Coleoptera 
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Table |. Quantitative evaluation of incidence estimators. Values 1 to 4 between parentheses indicate 
the score assigned to the estimator in each criterion (% samples, stabitity, SD). Final ranking position of 
estimators was obtained summarizing their scores (ss) in each order. 






































Coleoptera Diptera Ephemeroptera Prostigmata Trichoptera 
Estimator % samples Stability SD |% samples Stability SD |% samples Stability SD j|% samples Stability SD |% samples Stability SD 
ICE 30,86 1,61 3,64 | 40,00 1,60 1,81 | 58,82 0,21 0,92 | 42,85 486 2,69 | 73,17 0,32 0,6 
(3) (1) (2) (4) (1) (2) (3) (2) (3) (3) (4) (2) (3) (2) (3) 
Chao 2 25,92 2,55 6,44 24,44 2,99 3,21 100 0,07 0,43 38,09 3,59 4,77 96,34 0,12 0,47 
3 (2) (3) (4) (2) (3) (4) (4) a) (1) (2) (3) (4) (4) (1) (1) 
É Jack 38,27 1,75 1,68 | 26,66 1,86 1,27 | 54,90 0,40 0,48 | 42,86 3,23 1,56 | 45,12 0,39 0,53 
E (4) (2) (1) (3) (2) (1) (2) (3) (2) (3) (2) a) (2) (3) (2) 
E Jack2 24,69 2,80 3,63 19,30 4,11 2,98 40,20 0,58 1,40 30,95 2,91 3,32 39,02 0,76 1,69 
(1) (4) (2) (1) (4) (3) (1) (4) (4) (1) (1) (3) (1) (4) (4) 
Raning postion 1. ICE (ss=6) 1. Jack1 (ss=6) 1. Chao2 (ss=6) 1. Jack2 (ss=5) 1. Chao2 (ss=6) 
2. Jack1/Jack2 (ss=7) 2. ICE (ss=7) 2. Jack1 (ss=7) 2. Jack1 (ss=6) 2. Jack1 (ss=7) 
ICE 35,13 3,08 9,31 31,11 5,95 6,82 48,57 2,30 3,71 43,55 5,01 4,18 50,61 2,10 3,01 
(3) (1) (3) (3) (1) (2) (4) (2) a) (3) (2) (1) (4) (2) (1) 
Chao 2 22,97 12,88 15,12 13,33 15,05 18,54 12,38 7,83 13,01 29,03 11,83 10,90 20,99 2,43 8,44 
3 (1) (4) (4) (1) (3) (4) (2) (4) (4) (1) (4) (4) (2) (3) (4) 
3 Jack1 45,94 5,63 8,06 33,33 8,38 4,74 30,08 1,39 2,53 40,32 4,54 3,35 30,86 1,76 3,19 
2 (4) (2) (1) (4) (2) (1) (1) (1) (2) (2) a) (2) (3) (1) (2) 
O jack2 29,73 995 7/64 | 20,00 17,20 874] 30,10 3,90 5,93 | 29,03 7,65 5,63 | 16,05 8,28 7,02 
(2) (3) (2) (2) (4) (3) (1) (3) (3) (1) (3) (3) (1) (4) (3) 
Ranking position 1. ICE /Jack1/Jack2 1. ICE (ss=6) 1. Jack1 (ss=4) 1.Jack1 (ss=5) 1. Jack1 (ss=6) 
(ss=7) 2. Jack1 (ss=7) 2. Jack2 (ss=6) 2. ICE/Jack2 (ss=6) 2. ICE (ss=7) 

















Table II. Quantitative evaluation of abundance estimators. Values 1 and 2 between parentheses indicate 
the score assigned to the estimator in each criterion (% samples, stabitity, SD). Final ranking position of 
estimators was obtained summarizing their scores (ss) in each order. 
































Ephemeroptera Prostigmata Trichoptera Coleoptera Diptera 
Estimator % samples Stability SD |% samples Stability SD |% samples Stability SD |% samples Stability SD |% samples Stability SD 
ACE 100,00 0,07 0,00] 46,43 1,63 1,83] 40,90 0,66 1,42] 35,82 4,61 6,93] 38,09 1,90 3,81 
E (1) (1) (1) (1) (2) (2) (1) (2) (2) (1) (2) (1) (1) (1) (1) 
=> Chaol 100,00 0,07 0,45 85,71 0,38 0,72} 100,00 0,03 0,48 35,82 3,14 7,10 44,44 2,30 4,37 
5 (1) (1) (2) (2) (1) (1) (2) (1) (1) (1) (1) (2) (2) (2) (2) 
Ranking position ACE (ss=3) Chaol (ss=4) Chaol (ss=4) ACE/Chaol (ss=4) ACE (ss=3) 
ACE 39,70 2,87 5,13 9,37 5,02 3,48 80,30 1,75 1,61 
E a) o WM] @ o 01-20 a 
2 Chaol 39,70 1,67 5,74] 43,75 4,76 5,83| 66,66 1,49 3,86 
& a) w w| w w o| w CE 
Ranking position ACE/Chao1 (ss=4) Chao1 (ss=4) Chao1 (ss=4) 











and Prostigmata at family level) that never 
became asymptotic. General sequence 
variations: At family level in better sampled 
orders (Trichoptera and Ephemeroptera) and 
at genus level for Coleoptera, the curves 
showed a much steeper growth in the first 
10% of the sampling where it begins a 
gradual growth to reach the final value of 
estimate. 


Jack2 (see Fig. 1d): General behavior: 
Curves behaved very similarly to Jack1 in 
all aspects, except that at the family level 
presented small peaks during the line of the 


curves, which anyway did not reverse the 
general tendency to a logarithmic growth. 
At genera level, such instability was not that 
obvious and unlike the family level estimate, 
three orders achieved an asymptote (Diptera, 
Ephemeroptera and Prostigmata). There 
were no important exceptions to this general 
behavior. 


ACE (see Fig. 2a1 and 2b1) and Chao1 
(see Fig. 2a2 and 2b2): General behavior: 
at family and genera level they followed a 
logarithmic growth reaching or tending to 
asymptote at different levels of sampling 
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Fig. 1. Incidence estimator’s curves for family and genera. a) ICE, b) Chao2, 3) Jack1 and d) Jack2. 
X axis is re-scaled to percentage of sampling effort: 100%=the maximum value of the larger sample 
(Ephemeroptera). Y axis is rescaled to the percentage of richness estimated: 100%=final value of richness 
in each order (fill line). 
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Fig. 2. Abundance estimator’s curves for family and genera. a) ACE and b) Chao1. X axis is re-scaled 
to percentage of sampling effort: 100%=the maximum value of the larger sample (Ephemeroptera). Y 
axis is rescaled to the percentage of richness estimated: 100%=final value of richness in each order (fill 


line). 


effort in the best sampled orders. In all cases 
the growth was smooth, with no evidence of 
peak fluctuation. 


Application to the local data set 
Incidence data: At the family level (see 
Fig. la) ICE -and Chao2- estimated a final 
richness value equal to Sobs (Diptera, 
Ephemeroptera and Trichoptera), suggesting 
a high level of completeness of these 
inventories. For Prostigmata, ICE estimates 
ranged between 15 and 17% above Sobs. For 
Coleoptera, Jack1 suggested the existence of 
30% more families than currently known. 
At genus level (see Fig. 1a) ICE was close 
to Sobs of Diptera and Trichoptera. The 
estimated richness suggests values between 


10% (Trichoptera) and 12% (Ephemeroptera 
and Diptera) of unknown genera for these 
inventories, while this value grew to 20% for 
Prostigmata. 

Respect to Coleoptera, Jack1 suggested 
a very high percentage of unknown families 
(near 48%). 

Abundance data: At family level (see 
Fig. 2) estimates of ACE and Chao1 indices 
coincided or were nearly identical to the 
observed richness (Sobs) for Ephemeroptera, 
Trichoptera and Prostigmata. For Coleoptera 
and Diptera atthe family level, both estimators 
suggested an unknown percentage of 
approximately 35% and 17% respectively. 

At genus level (see Fig. 2b), both indices 
of abundance estimated very similar richness 
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values for each of the three studied orders, 
being Prostigmata the group with the highest 
percentage of unknown genera (15%). 


DISCUSSION 


The impact of human activities on 
ecological systems is well illustrated by 
changes in land use arising from urbanization, 
whose consequences are a serious threat to 
biodiversity conservation (Gonzalez-Oreja 
et al. 2010). Ecologists face the challenge 
of quantifying this impact proposing precise 
tools and measures to preserve biodiversity. 
In this context, groups of organisms that act 
as biodiversity indicators -like many benthic 
macroinvertebrates- are of great importance 
as they enable these changes to be monitored 
(Valladares et al., 2010). 

Non-parametric estimators can help to 
reach inventories in less time and with lower 
costs (Petersen & Meier, 2003), since they 
can give a measure of how much sampling 
effort can be enough to reach a representative 
inventory. Hence the importance to assess 
their reliability to adequately adjust them 
for each group and region. Moreover, when 
assessing the relevance of a single area in 
terms of species richness, endemism, or 
conservation status, the use of richness as a 
tool for decision making can be misleading 
without a measure of how complete the lists 
are (Soberón & Llorente, 1993). Measuring 
the completeness is then the safest approach 
for dealing with taxonomic, geographic 
or ecological biases. This is especially 
important when inventories are derived 
from heterogeneous sources such as non- 
standardized samplings or bibliographic 
references (Valladares 2010). Anyway, 
even if the data were imperfect (e.g. 
museum collection lists), if the appropriate 
technique is used and a strict interpretation 
of the results is made, we can evaluate 
the quality of the lists of species to locate 
priority areas for conservation (Heyer et al., 
1999). The assessment of the completeness 
of inventories is a powerful tool that non- 
parametric estimators provide. Moreover, a 
widely recognized program to calculate them 
(EstimateS in all versions, Colwell 1994) is 


freely available on the web, it is very easy 
to use and does not require the user to make 
any statistical procedures for calculation. 

In this study, among the incidence 
estimators, the one of better overall 
performance was Jack1, followed by ICE and 
Chao2. These results are slightly different 
from those of Melo & Froehlich (2001) 
who firstly recommend Jack2 for benthic 
studies. In the present study Jack2 had a 
similar performance -but not as good as- ICE 
at genus level analysis. At the family level 
Jack2 performance was very poor. In Melo 
& Froehlich (2001), ICE and Chao2 show a 
good performance, as in the present study. 
Gonzalez-Oreja e.t al. (2010) studied the 
performance of non parametric estimators 
with data of birds of urban areas in the city 
of Puebla, Mexico. They concluded that 
Jack1 was the estimator with better general 
performance, even when they made the 
evaluation under some criteria they called 
“hard”, i.e. criteria that are very rigorous 
statistically. 

Regarding abundance estimators, the 
analysis did not reveal a better overall 
performance estimator, but performance 
depended on whether they were applied to 
family or genus level. Whereas the number 
of samples had little or no difference among 
family and genus level within the same order 
(see Table 11), numerically this could indicate 
that some estimators have better performance 
at low richness samples (level of family) and 
others are more effective when richness 
reaches higher values observed (genus 
level). Then, according to the results of this 
study, a most advisable estimator for low 
Sobs samples would be ACE, whereas Chao1 
would be the best for high Sobs samples. 

The curves shapes analysis allow me not 
to propose the use of estimators of more 
erratic behavior in the first portion of the 
sample -like ICE and Chao2- if the sample to 
be analyzed is small, agreeing in this point 
with Rico et al. (2005). 

Nonetheless, from the description of 
curve shapes it can be said that all are quite 
stable, being the variations related to the 
representativeness achieved by each order 
in relation to sampling effort. This uniformity 
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in curves form across different data sets and 
hierarchic levels allowed to describe general 
sequences that turn out to be very useful 
from a predictive point of view. That means, 
for example, that they can act as references 
to compare estimations of small databases 
of macroinvertebrates and then infer the 
possible behavior of the curve —and therefore 
the expected richness- if the sample were 
larger. 

In a regional context, non-parametric 
estimates showed good results, ie. high 
completeness oftheinventories of Trichoptera, 
Ephemeroptera and Prostigmata at both 
hierarchical levels, while they demonstrated 
a great lack of knowledge in Coleoptera. 
These results can be attributed to the fact that 
only Elmidae is widely studied. Estimates for 
Diptera produced very good results respect 
to inventory completeness, however these 
results should be taken carefully, since it is 
a very diverse group and intuitively much 
greater richness than that recorded would 
be expected in this area. As in Coleoptera, a 
possible explanation for this could be the low 
level of determination that can be reached 
for this group in the region. 


CONCLUSION 


The results of this study allow to propose 
the non-parametric estimators Jack1, ACE 
and Chao1 as excellent tools for both the 
estimation of richness and as a measure 
of the completeness of inventories of some 
of the most conspicuous groups of benthic 
macroinvertebrates. The application of the 
estimators to the local data set shows that 
the best known orders of the region are 
Trichoptera, Ephemeroptera and Prostigmata, 
and in a lower degree Coleoptera, while 
the inventory of the Diptera presented 
intermediate values of representativeness. 
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Appendix. Adapted from EstimateS User's Guide. Appendix B 
(http://viceroy.eeb.uconn.edu/EstimateSPages/EstSUsersGuide/EstimateSUsersGuide.htm#AppendixB) 





Incidence estimators: 
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Where: 

Q, is the number of uniques 

Q,: number of duplicates 

n: Total number of samples 

Sieq: number of frequent species, i.e species present in more than k samples (by 
default each found in more than 10 samples). 

S,,,: number of infrequent species, i.e species present in less than k samples (by 
default each found in 10 or fewer samples). 

C, Sample incidence coverage estimator 

N. 2 Total number of incidences (occurrences) of infrequent species 

Nn, Number of samples that have at least one infrequent species 

Qj: Number of species that occur in exactly ¡samples (Q, is the frequency of 

uniques, Q, the frequency of duplicates) 
Q: Total frequency of uniques 


Vice Estimated coefficient of variation of the Q, for infrequent species 
Abundance estimators: 
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Where: 
A,: number of species singletons and 
A,, number of species doubletons 
S bun: number of abundant species, i.e. species represented by more than k 
individuals (by default each with more than 10 individuals). 
S ae: number of non abundant species, i.e. species represented by less than k 
individuals (by default each with 10 or fewer individuals) 
Cae: Sample abundance coverage estimator 
Ne: Total number of individuals in rare species 
Ai: Number of species that have exactly ¡individuals when all samples are pooled 
(F, is the frequency of singletons, F<sub>2</sub>the frequency of doubletons) 


Yace: Estimated coefficient of variation of the A, for infrequent species. 


